Designing a New Bloom Filter-based Index for Distributed Data Management ⋆
نویسندگان
چکیده
Distributed architectures are widely used in Internet applications nowadays. In such systems, one of the key techniques is how to maintain an indexing data structure which records elements of each single node in the system. Bloom filter is one of the popular solutions. The beautiful mathematical format offers a fast and space-efficient solution for probabilistic membership presentation. In many Internet applications, user access for items follows Zipf’s law where a small number of items attract many visits. According to that phenomenon, we propose a selective insertion method of bloom filter to reduce the workload of BFs by finding an optimal load ratio. The experiments show that our new approach can reduce the false lookup time by 36% compared with the pure bloom filter approach.
منابع مشابه
A Privacy Preserving Model for Ownership Indexing in Distributed Storage Systems
The indexing technique in distributed object storage system is the crucial part of a large scale application, where the index data structure may be published in many nodes. Here arises a problem on preserving the privacy of the ownership information while supporting queries on item locations with limited index space. Probabilistic data structure, such as the bloom filter which records the locat...
متن کاملA Cuckoo Filter Modification Inspired by Bloom Filter
Probabilistic data structures are so popular in membership queries, network applications, and so on. Bloom Filter and Cuckoo Filter are two popular space efficient models that incorporate in set membership checking part of many important protocols. They are compact representation of data that use hash functions to randomize a set of items. Being able to store more elements while keeping a reaso...
متن کاملBsi: Bloom Filter-based Semantic Indexing for Unstructured P2p Networks
Resource management and search is very important yet challenging in large-scale distributed systems like P2Pnetworks. Most existing P2P systems rely on indexing to efficiently route queries over the network. However, searches based on such indices face two key issues. First, majority of existing search schemes often rely on simply keyword based indices that can only support exact string based m...
متن کاملLocationSpark: A Distributed In-Memory Data Management System for Big Spatial Data
We present LocationSpark, a spatial data processing system built on top of Apache Spark, a widely used distributed data processing system. LocationSpark offers a rich set of spatial query operators, e.g., range search, kNN, spatio-textual operation, spatial-join, and kNN-join. To achieve high performance, LocationSpark employs various spatial indexes for in-memory data, and guarantees that immu...
متن کاملCOCA Filters: Co-occurrence Aware Bloom Filters
We propose an indexing data structure based on a novel variation of Bloom filters. Signature files have been proposed in the past as a method to index large text databases though they suffer from a high false positive error problem. In this paper we introduce COCA Filters, a new type of Bloom filters which exploits the co-occurrence probability of words in documents to reduce the false positive...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014